Grok 4 Fast

Grok 4 Fast | xAI

https://note.com/npaka/n/n83f9d4883b8e

「Grok 4 Fast」は、平均で40%少ない思考トークンを使用しながら、ベンチマークで「Grok 4」と同等の性能を実現します。

トークンあたりの価格も大幅に低下したため、価格が98%低下した

https://gyazo.com/f15dc5c53e5f300f0be4b4a5dec16e92

ここだけ見るととんでもないな基素.icon

2026-02-20 一向に噂を聞かないがどうなってるの

Sonnet 4.6.iconベンチマークの数字は本物だが、使われる文脈がずれていた＋モデルの入れ替わりが速くて話題になる前に陳腐化

何が起きたか

Scrapboxのページにある通り、Grok 4 Fastは2025年9月リリースで「Grok 4比で40%少ない思考トークン、98%コスト削減、LMArena ＃8」という華々しいデビューをしました。発表した人物（Dustin Van Tran、Google DeepMindから移籍したxAIの人）も強気なことを言っていましたね。

現在（2026年2月）の実態

今日のArena（旧LMArena）のTextリーダーボードを確認すると：

Grok-4.1-thinking（後継の推論版）：6位（スコア1474） ← これはそこそこ上位

Grok-4.1（非推論版）：10位（スコア1463）

Grok-4-fast-chat（元のGrok 4 Fast）：42位（スコア1422） ← 大幅に下落

当初「#8」と言っていたGrok 4 Fastは現在42位に落ちており、すでにxAI自身がGrok 4.1 Fast（2025年11月リリース）への移行を推奨しています（Artificial Analysisでも「newer model」として案内されていました）。

なぜ噂を聞かないのか

まずGrok 4（元の重いほう）はARC-AGIやAIME等の難しい数学・論理ベンチマーク特化型で、Codeで日常使いしても恩恵を感じにくいです。

Grok 4 Fastはその「安く速い版」なので、Claude Codeのような実用的なコーディング支援ではGemini 2.5 ProやClaude Opus 4系と比べて突出した強みがありません。

Artificial Analysisの評価でも「intelligence ＃13/78、reasoningなし」と、非推論モデルとしてはそこそこですが別に圧倒的ではないです。

また、そもそもGrokはAPIアクセスが（Anthropic・OpenAI・Googleと比べて）普及しておらず、Claude Codeなどのツールのデフォルトモデルでもないため、開発者のコミュニティでの口コミが回りにくいという事情もあります。

データを見るとコーディングタスクでの性能はかなり低いです。今日確認した数字をまとめると：Arena Code リーダーボード（2026年2月20日、44モデル中）

https://arena.ai/ja/leaderboard/code

table:_

モデル順位スコア

Claude Opus 4-6（1位） 1位 1561

grok-4-1-fast-reasoning 37位 1235

grok-4.1-thinking 40位 1204

grok-4-fast-reasoning 42位 1153

grok-code-fast-1 43位 1140

コードArenaは44モデルしかないのでほぼ最下位圏です。

Aider Polyglotベンチマーク（実際のコード編集タスク）では、Grok 4本体は79.6%で7位と悪くないものの、Grok 4 Fastはそもそもデータが存在しません（テストすらされていない）。

このベンチマークopusのモデル202505だし古い基素.icon

なぜコーディングが弱いかというと、Grok 4の設計思想が「数学・論理パズルの解法能力」に特化していて、コーディングに必要な「長いコンテキストを参照しながら差分を正確に生成する」「ツール呼び出しを適切に連鎖させる」「エラーメッセージを読んで修正する」といった実務的なスキルは別の話だからです。Grok 4本体はARC-AGIトップというのは本当ですが、それは「見たことない問題を解く」能力であって、エンジニアリング補助の得意さとはずれています。

https://x.com/dustinvtran/status/1969183617881686405

I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/

@noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then leading the team to leapfrog to LMArena ＃1 (and stay there for over a year!), and finally working on the incredible reasoning innovations for Gemini’s IMO & ICPC gold medals (w/ @HengTze @quocleix ).

Gemini has been a wild journey from one paradigm to another: first, revamping our LaMDA model (the first instruction-like chatbot!) from an actual chatbot to long contentful responses with RLHF; then, reasoning and deep thinking by training over long thinking chains, novel environments, and reward heads. When we first started, public sentiment was bad. Everyone thought Google was doomed to fail due to its search legacy and organizational politics. Now, Gemini is consistently ＃1 in user preference and spearheading new scientific accomplishments, and everyone thinks Google winning is obvious. 😂 (It also used to be the case that OpenAI would jump the AI newscycle by announcing before us from a backlog of ideas for every new Google release; safe to say that backlog is empty.)

I have since joined xAI. The recipe is well-known. Compute, data, and O(100) brilliant, hard-working people are all that’s needed to obtain a frontier-level LLM. xAI *really* believes in this. For compute, even at Google I have never experienced this # of chips per capita (& 100K+ GB200/300K’s are incoming with Colossus 2). For data, Grok 4 made the biggest bet in scaling RL & posttraining. xAI is making new bets to scale data, deep thinking, and the training recipe. And the team is quick. No company has gotten to where xAI is today in AI capabilities in as little as time. As @elonmusk says, a company’s first- and second-order derivatives are the most important: xAI’s acceleration is the highest.

I’m excited to announce that in my first few weeks, we launched Grok 4 Fast. Grok 4 is an amazing reasoning model, still the top on ARC-AGI and new benchmarks like FinSearchComp. But it’s slow and was never really targeted for general-purpose user needs. Grok 4 Fast is the best mini-class model—on LMArena, it is ＃8 (Gemini 2.5 Flash is ＃18!), and on core reasoning evals like AIME, it is on par with Grok 4 while 15x cheaper. S/o to @LiTianleli @jinyilll @ag_i_2211@s_tworkowski @keirp1 @yuhu_ai_